%load_ext autoreload
%autoreload 2
import mpld3
mpld3.enable_notebook()
from package.cc import ChemicalChecker
import os
os.environ['CC_CONFIG'] = 'config.json'
cc_local = ChemicalChecker()
We will start by creating the space objects that will help us connect with the data to create the visualizations:
# Mechanism of Action (B1)
MoA = cc_local.get_signature('char4', 'full', 'B1.001')
# Therapeutic Areas (E1)
ATC = cc_local.get_signature('char4', 'full', 'E1.001')
# Side effects (E3)
side_effects = cc_local.get_signature('char4', 'full', 'E3.001')
2022-06-29 00:32:10.790690: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /.singularity.d/libs 2022-06-29 00:32:10.790722: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
This objects allow us to better explore the data. Following the naming use in packages such as sklearn, these objects have a 'fit' method to train the instance, generating all the data needed for the visualizations. Unfortunately, this process is data-intensive and computationally very expensive, as it involves performing a Fisher's exact test for each feature for each of the molecules. In a big space such as B4 this involves doing this computation 2 billion times (631027 molecules by 4635 features), not to mention the amounts of memory needed to store the results. For such reasons, the code is designed ad hoc to be ran in our HPC facilities. Nonetheless, it is possible to generate the visualizations using preprocessed data. To generate molecule visualizations, we just need to run the 'predict' method, giving a query molecule. This query can be input as an InChI key, SMILES or molecule name. Here we will analyse Atenolol, a beta-blocking agent used to treat high blood pressure and heart-associated chest pain.
%matplotlib inline
# Mechanism of action
_, df = MoA.predict('Atenolol')
df
RDKit ERROR: [00:32:12] SMILES Parse Error: syntax error while parsing: Atenolol [00:32:12] SMILES Parse Error: syntax error while parsing: Atenolol RDKit ERROR: [00:32:12] SMILES Parse Error: Failed parsing SMILES 'Atenolol' for input: 'Atenolol' [00:32:12] SMILES Parse Error: Failed parsing SMILES 'Atenolol' for input: 'Atenolol' [INFO ] Using /tmp/tfhub_modules to cache modules. 2022-06-29 00:32:14.136370: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /.singularity.d/libs 2022-06-29 00:32:14.136405: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303) 2022-06-29 00:32:14.136422: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (monzo): /proc/driver/nvidia/version does not exist 2022-06-29 00:32:14.136603: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-06-29 00:32:15.371292: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 2022-06-29 00:32:15.391431: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3092800000 Hz 100%|██████████████████████████████████████| 1001/1001 [00:01<00:00, 781.04it/s]
| Feature | Description | Score | |
|---|---|---|---|
| 0 | P07550(1) | Favors Beta-2 adrenergic receptor | 1.00 |
| 1 | P08588(-1) | Against Beta-1 adrenergic receptor | 1.00 |
| 2 | Class:544(1) | Favors Adrenergic receptor | 1.00 |
| 3 | P07550(-1) | Against Beta-2 adrenergic receptor | 1.00 |
| 4 | Class:1266(1) | Favors Monoamine receptor | 1.00 |
| 5 | Class:544(-1) | Against Adrenergic receptor | 0.90 |
| 6 | Class:1088(1) | Favors Small molecule receptor (family A GPCR) | 0.88 |
| 7 | Class:1020(1) | Favors Family A G protein-coupled receptor | 0.65 |
| 8 | Class:11(1) | Favors Membrane receptor | 0.57 |
| 9 | P13945(1) | Favors Beta-3 adrenergic receptor | 0.51 |
| 10 | P08588(1) | Favors Beta-1 adrenergic receptor | 0.50 |
| 11 | Class:1266(-1) | Against Monoamine receptor | 0.35 |
| 12 | Class:1088(-1) | Against Small molecule receptor (family A GPCR) | 0.29 |
| 13 | Class:0(1) | Favors Protein class | 0.23 |
| 14 | P13945(-1) | Against Beta-3 adrenergic receptor | 0.23 |
| 15 | Class:1020(-1) | Against Family A G protein-coupled receptor | 0.20 |
| 16 | Class:11(-1) | Against Membrane receptor | 0.18 |